Availability Modeling and Evaluation on High Performance Cluster Computing Systems

نویسندگان

  • Hertong Song
  • Chokchai Leangsuksun
  • Raja Nassar
چکیده

Cluster computing has been attracting more and more attention from both the industrial and the academic world for its enormous computing power and scalability. Beowulf type cluster, for example, is a typical High Performance Computing (HPC) cluster system. Availability, as a key attribute of the system, needs to be considered at the system design stage and monitored at mission time. Moreover, system monitoring is a must to help identify the defects and ensure the system’s availability requirement. In this paper, novel solutions which provide availability modeling, model evaluation, and data analysis as a single framework have been investigated. Three key components in the investigation are availability modeling, model evaluation, and data analysis. The general availability concepts and modeling techniques are briefly reviewed. The system’s availability model is divided into submodels based upon their functionalities. Furthermore, an object oriented Markov model specification to facilitate availability modeling and runtime configuration has been developed. Numerical solutions for Markov models are examined, especially on the uniformization method. The paper also presents a monitoring and data analysis framework, which is responsible for failure analysis and availability reconfiguration. ACM Classification: D.2.11, D.2.12, D.2.13

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

The Modeling and Dependability Analysis of High Availability OSCAR Cluster System

OSCAR is widely used for building and maintaining a high-performance parallel computing system. In many cases, high availability requirement becomes as critical as high performance. In this paper, the current OSCAR cluster system is introduced. Some high availability consideration is discussed and the high availability OSCAR cluster system is presented. Continuous Time Markov Chain models are b...

متن کامل

A New Availability Concept for (n, k)-way Cluster Systems Regarding Waiting Time

It is necessary to have the precise definition of available performance of high availability systems that can represent the availability and performability of the systems altogether. However, the difference between numeric scales of availability and performance metrics such as waiting time makes quantitative evaluation difficult. A number of previous studies on availability do not include a per...

متن کامل

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Research and Practice in Information Technology

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2006